Efficient Computation of Unconditional Error Rate Estimators for Learning Algorithms and an Application to a Biomedical Data Set Master Thesis
نویسنده
چکیده
We derive an unbiased variance estimator for re-sampling procedures using the fact that those procedures are incomplete U-statistics. Our approach is based on careful examination of the combinatorics governing the covariances between re-sampling iterations. We establish such an unbiased variance estimator for the special case of K-Fold cross-validation. This estimator exists as soon as new observations are added to the original sample, and we specify how many additional observations are necessary. Thus we make re-sampling procedures comparable. We make no assumptions on the underlying distribution and we take the covariances between re-sampling iterations into account. Beyond that we show an approach to find a re-sampling design with minimal variance for a fixed size of learning sets. We empirically show the existence of designs with smaller variance than repeated cross-validation. We systemically compare with the complete U-statistic, the leave-p-out estimator. Our examination is completed by an application to micro-array data.
منابع مشابه
Blind Performance Estimation and Quantizer Design with Applications to Relay Networks
In this thesis, we introduce blind estimators for several performance metrics of Bayesian detectors, we study rate-information-optimal quantization and introduce algorithms for quantizer design in the communications context, and we apply our results to a relay-based cooperative communication scheme. After a discussion of the background material which serves as a basis for this thesis, we study ...
متن کاملMachine learning algorithms for time series in financial markets
This research is related to the usefulness of different machine learning methods in forecasting time series on financial markets. The main issue in this field is that economic managers and scientific society are still longing for more accurate forecasting algorithms. Fulfilling this request leads to an increase in forecasting quality and, therefore, more profitability and efficiency. In this pa...
متن کاملApplication of the Extreme Learning Machine for Modeling the Bead Geometry in Gas Metal Arc Welding Process
Rapid prototyping (RP) methods are used for production easily and quickly of a scale model of a physical part or assembly. Gas metal arc welding (GMAW) is a widespread process used for rapid prototyping of metallic parts. In this process, in order to obtain a desired welding geometry, it is very important to predict the weld bead geometry based on the input process parameters, which are voltage...
متن کاملEfficient Approximation Algorithms for Point-set Diameter in Higher Dimensions
We study the problem of computing the diameter of a set of $n$ points in $d$-dimensional Euclidean space for a fixed dimension $d$, and propose a new $(1+varepsilon)$-approximation algorithm with $O(n+ 1/varepsilon^{d-1})$ time and $O(n)$ space, where $0 < varepsilonleqslant 1$. We also show that the proposed algorithm can be modified to a $(1+O(varepsilon))$-approximation algorithm with $O(n+...
متن کاملProposing an Efficient Software-Based Method for Enhancing the Reliability of Critical Application Robot
Robots play such remarkable roles in humans’ modern lives that performing many tasks without them isimpossible. Using robotic systems is gradually increasing the tasks allocated to them and they are becomingmore complex and critical. Software reliability is one of the most significant requirements of robots. Forenhancing reliability, systems should be inherently designed to be tolerable of soft...
متن کامل